Indexing by statistical tagging

نویسندگان

  • Pierrette Bouillon
  • Robert Baud
  • Gilbert Robert
  • Patrick Ruch
چکیده

Lexical ambiguity is a fundamental problem in Information Retrieval (IR), especially in the medical domain. Many systems use a subset of the words contained in the document to represent the content, but they are faced with the problem of ambiguity. In this paper, we propose a method for disambiguation based on existing medical terminological resources on the one hand, and statistical tools for linguistic annotation on the other, in order to develop more satisfactory indexing techniques for patient reports. The main hypothesises guiding this method are that: (i) Syntax can help to distinguate meanings of words that are polyfunctional. (ii) Syntactic analysis can be done by a probabilistic tagger (HMM, Hidden Markov Model) and, more daringly, (iii) remaining semantic ambiguity can also be solved (mutatis mutandis) by an HMM tagger.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collaborative thesaurus tagging the Wikipedia way

This paper explores the system of categories that is used to classi­ fy articles in Wikipedia. It is compared to collaborative tagging systems like del.icio.us and to hierarchical classification like the Dewey Decimal Classification (DDC). Specifics and common­ alities of these systems of subject indexing are exposed. Analysis of structural and statistical properties (descriptors per record, re...

متن کامل

Hybrid POS tagging with generalized unknown-word handling

This paper presents POSTAG 1 as a statistical/rule-based hybrid part-of-speech (POS) tagging system with generalized unknown-word handling. The POSTAG integrates morphological analysis with statistical POS disambigua-tion and post rule-based error-correction. The error-correction rules are automatically learned from a tagged corpus and selectively correct standard HMM tagging errors. The morpho...

متن کامل

Optimizing Parsing with Multiple Pipelining

This paper presents a technique for tagging in natural language processing that can enhance the speed and accuracy of the part-of-speech tagging in the statistical parsing by using pipelining concept for fast searching and indexing. The running time of a parser depends upon the searching of respective words in the word-bank and their respective tags to match with the parse trees stored in the P...

متن کامل

Tagging, Folksonomy & Co - Renaissance of Manual Indexing?

This paper gives an overview of current trends in manual indexing on the Web. Along with a general rise of user generated content there are more and more tagging systems that allow users to annotate digital resources with tags (keywords) and share their annotations with other users. Tagging is frequently seen in contrast to traditional knowledge organization systems or as something completely n...

متن کامل

Intellexer Question Answering

1. Intellexer NL parser 1.1) Tokenizer 1.2) Statistical tagger 1.3) Rule-based tagging corrector (RBTC) 1.4) Chunker 1.5) Lexicalized parser 1.6) Paraphraser 1.7) Generation of terms and pairs 2. Indexing and answering 2.1) Indexing 2.2) Resolving anaphora in questions 2.3) Matching Factoid questions 2.4) Matching List questions 2.5) Matching Other questions 2.6) Difference between runs A, B and C

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000